home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-11-18 | 56.2 KB | 1,528 lines | [TEXT/MPS ] |
- C.S.M.P. Digest Sun, 15 Mar 92 Volume 1 : Issue 18
-
- Today's Topics:
-
- Determining size of file
- Opening docs in non-app-event-aware apps--SOLUTION
- Fatest code to fill memory?
-
-
- The Comp.Sys.Mac.Programmer Digest is moderated by Michael A. Kelly.
-
- These digests are available (by using FTP, account anonymous, your email
- address as password) in the pub/mac/csmp-digest directory on ftp.cs.uoregon.
- edu (try skinner.cs.uoregon.edu if that doesn't work). This is also the home
- of the comp.sys.mac.programmer Frequently Asked Questions list.
-
- These digests are also available via email. Just send a note saying that you
- want to be on the digest mailing list to mkelly@cs.uoregon.edu, and you will
- automatically receive each new digest as it is created.
-
- The articles in these digests are taken directly from comp.sys.mac.programmer.
- They are not edited; all articles included in this digest are in their original
- posted form. The only articles that are -not- included in these digests are
- those which didn't receive any replies (except those that give information
- rather than ask a question). All replies to each article are concatenated
- onto the original article in the order in which they were received. Article
- threads are not added to the digests until the last article added to the
- thread is at least one month old (this is to ensure that the thread is dead
- before adding it to the digests).
-
- Send administrative mail to mkelly@cs.uoregon.edu.
-
- -------------------------------------------------------
-
- From: crow@ccwf.cc.utexas.edu (David L. Crow)
- Subject: Determining size of file
- Date: 9 Feb 92 01:51:43 GMT
- Organization: The University of Texas at Austin, Austin TX
-
-
- I am trying to write a program that can determine the size of a file. I
- am using Think C 5.0 with System 7. In UNIX, I would use the "stat" sub-
- routine, but I am a little unsure how to get this info on the Mac. I have
- been trying to use FSOpen and PBHGetFInfo, but they aren't working as I
- would suspect from reading Inside Mac. Are these the right routines to
- use? Is there a better way? Maybe using the ANSI library instead of the
- Toolbox?
-
- Thanks!
- --
- David L. Crow crow@ccwf.cc.utexas.edu
-
-
-
- - -------------------------
-
- From: mcmath@csb1.nlm.nih.gov (Chuck McMath)
- Subject: Determining size of file
- Date: 10 Feb 92 12:53:32 GMT
- Organization: MSD
-
- In article <66447@ut-emx.uucp>, crow@ccwf.cc.utexas.edu (David L. Crow) writes:
- >
- >
- > I am trying to write a program that can determine the size of a file. I
- > am using Think C 5.0 with System 7. In UNIX, I would use the "stat" sub-
- > routine, but I am a little unsure how to get this info on the Mac. I have
- > been trying to use FSOpen and PBHGetFInfo, but they aren't working as I
- > would suspect from reading Inside Mac. Are these the right routines to
- > use? Is there a better way? Maybe using the ANSI library instead of the
- > Toolbox?
- >
- > Thanks!
- > --
- > David L. Crow crow@ccwf.cc.utexas.edu
- >
- >
-
- Open the file, then call (Pascal syntax):
-
- err := GetEOF(refNum, logEOF);
-
- logEOF === 'logical end-of-file' === size of file in bytes.
-
- Inside Mac Volume II, pages 93,112.
-
- Cheers!
-
- chuck
-
- --chuck mcmath-
- mcmath@csb1.nlm.nih.gov
- MSD, Inc. * National Library of Medicine * National Institutes of Health
- Bethesda, MD 20894
-
-
-
- - -------------------------
-
- From: keith@Apple.COM (Keith Rollin)
- Subject: Determining size of file
- Date: 10 Feb 92 21:25:26 GMT
- Organization: Apple Computer Inc., Cupertino, CA
-
- In article <66447@ut-emx.uucp> crow@ccwf.cc.utexas.edu (David L. Crow) writes:
- >
- > I am trying to write a program that can determine the size of a file. I
- > am using Think C 5.0 with System 7. In UNIX, I would use the "stat" sub-
- > routine, but I am a little unsure how to get this info on the Mac. I have
- > been trying to use FSOpen and PBHGetFInfo, but they aren't working as I
- > would suspect from reading Inside Mac. Are these the right routines to
- > use? Is there a better way? Maybe using the ANSI library instead of the
- > Toolbox?
- >
- Use PBHGetFInfo or PBGetCatInfo. You don't need to futz with FSOpen.
-
- Using the built-in File Manager calls is the best way. Using ANSI calls
- usually have the problem of having to translate themselves into the
- built-in calls. This takes longer, and runs the risk of losing
- something in the translation.
-
- --
- - ----------------------------------------------------------------------------
- Keith Rollin --- <Taligent .signature under construction>
- Disclaimer: Pretty soon, I really _won't_ be speaking for Apple...
-
-
-
- - -------------------------
-
- From: oster@well.sf.ca.us (David Phillip Oster)
- Subject: Determining size of file
- Date: 13 Feb 92 13:59:31 GMT
- Organization: Whole Earth 'Lectronic Link, Sausalito, CA
-
- In article <62649@apple.Apple.COM> keith@Apple.COM (Keith Rollin) writes:
- >Use PBHGetFInfo or PBGetCatInfo. You don't need to futz with FSOpen.
- Don't you have to check the file system type before you can use these calls?
- I believe that "bad things will happen" if you try to use these on a MFS
- file system. At least FSOpen() doesn't care whether the input is MFS or HFS.
- --
- -- David Phillip Oster - At least the government doesn't make death worse.
- -- oster@well.sf.ca.us = {backbone}!well!oster
-
-
-
- - -------------------------
-
- From: jcav@quads.uchicago.edu (JohnC)
- Subject: Determining size of file
- Date: 14 Feb 92 21:50:36 GMT
- Organization: The Royal Society for Putting Things on Top of Other Things
-
- In article <29997@well.sf.ca.us> oster@well.sf.ca.us (David Phillip Oster) writes:
- >In article <62649@apple.Apple.COM> keith@Apple.COM (Keith Rollin) writes:
- >>Use PBHGetFInfo or PBGetCatInfo. You don't need to futz with FSOpen.
- >Don't you have to check the file system type before you can use these calls?
- >I believe that "bad things will happen" if you try to use these on a MFS
- >file system. At least FSOpen() doesn't care whether the input is MFS or HFS.
-
- Well, if HFS is running then the worst that can happen is that you'll get a
- "wrongVolType" error. Of course, if HFS is _not_ running then you'll get a
- system error when you try to call _PBGetCatInfo.
-
- --
- John Cavallino | EMail: jcav@midway.uchicago.edu
- University of Chicago Hospitals | John_Cavallino@uchfm.bsd.uchicago.edu
- Office of Facilities Management | USMail: 5841 S. Maryland Ave, MC 0953
- B0 f++ c+ g+ k s+(+) e+ h- pv | Chicago, IL 60637
-
-
-
- ---------------------------
-
- From: greeny@top.cis.syr.edu (Jonathan Greenfield)
- Subject: Opening docs in non-app-event-aware apps--SOLUTION
- Organization: CIS Dept., Syracuse University
- Date: Sat, 8 Feb 92 21:28:01 EST
-
- Thanks to the several people who tried to help me with this problem. It
- was actually quite frustrating to have people telling me "just send an 'odoc'
- event, and the Process Manager will take care of it" since I had already
- tried doing this.
-
- I eventually decided to mess around and see if there was some kind of "trick"
- that was required. What I discovered is that (for some reason that is
- unknown to me, and apparently undocumented) you have to await a reply when
- you send the 'odoc' event, in order for the Process Manager to properly
- open the document.
-
- Perhaps DTS should put this information in one of their "snippets" or
- something like that. (Or perhaps they already have, though I'm not aware
- of it...)
-
- In any case, the application involved, LaunchPad, was just submitted to
- the sumex archive, and should be available soon. It is a very simple
- application that allows Clean-Desktop-types to clear off their desktops,
- and still have drag-and-drop access to a lot of different applications.
-
- It's freeware, so try it out, and let me know what you think.
-
- --
- J. S. Greenfield greeny@top.cis.syr.edu
- (I like to put 'greeny' here,
- but my d*mn system wants a
- *real* name!) "What's the difference between an orange?"
-
-
-
- - -------------------------
-
- From: grobbins@Apple.COM (Grobbins)
- Subject: Opening docs in non-app-event-aware apps--SOLUTION
- Date: 11 Feb 92 09:01:42 GMT
- Organization: Apple CTS
-
- In article <1992Feb8.212801.7577@newstand.syr.edu> greeny@top.cis.syr.edu (Jonathan Greenfield) writes:
- >What I discovered is that (for some reason that is
- >unknown to me, and apparently undocumented) you have to await a reply when
- >you send the 'odoc' event, in order for the Process Manager to properly
- >open the document.
-
- What you have to do is call WaitNextEvent, since events don't get sent
- until WNE time. Using kAEWaitReply makes the Apple Event manager call
- WaitNextEvent for you, as mentioned on page 6-60 of Inside Mac VI.
-
- >Perhaps DTS should put this information in one of their "snippets" or
- >something like that. (Or perhaps they already have, though I'm not aware
- >of it...)
-
- The latest information from DTS is in the Tech Notes and the Q&A Stack.
-
- Grobbins grobbins@apple.com
-
-
-
- - -------------------------
-
- From: greeny@top.cis.syr.edu (Jonathan Greenfield)
- Subject: Opening docs in non-app-event-aware apps--SOLUTION
- Date: 13 Feb 92 18:05:13 GMT
- Organization: CIS Dept., Syracuse University
-
- In article <62679@apple.Apple.COM> grobbins@Apple.COM (Grobbins) writes:
- >In article <1992Feb8.212801.7577@newstand.syr.edu> greeny@top.cis.syr.edu (Jonathan Greenfield) writes:
- >>What I discovered is that (for some reason that is
- >>unknown to me, and apparently undocumented) you have to await a reply when
- >>you send the 'odoc' event, in order for the Process Manager to properly
- >>open the document.
- >
- >What you have to do is call WaitNextEvent, since events don't get sent
- >until WNE time. Using kAEWaitReply makes the Apple Event manager call
- >WaitNextEvent for you, as mentioned on page 6-60 of Inside Mac VI.
-
- This is not a sufficient explanation, since the Apple event is properly
- posted, without any trouble, as long as the 'odoc' does not have to
- be converted to "puppet strings."
-
- Since the AE posting behavior differs depending upon whether or not
- conversion to "puppet strings" is necessary, I can only conclude that the
- need to wait is due to the Process Manager's need to make the conversion,
- and not due to the Event Manager's method for posting the event.
-
- --
- J. S. Greenfield greeny@top.cis.syr.edu
- (I like to put 'greeny' here,
- but my d*mn system wants a
- *real* name!) "What's the difference between an orange?"
-
-
-
- ---------------------------
-
- From: taihou@iss.nus.sg (Tng Tai Hou)
- Subject: Fatest code to fill memory?
- Organization: Institute of Systems Science, NUS, Singapore
- Date: Mon, 10 Feb 1992 15:43:46 GMT
-
- Can anyone recommend sample 'c' or 680x0 assembly code to fill
- contiguous memory with some value? For example:
-
- int i;
- Ptr p;
-
- p = baseAddr;
- for (i=0; i<100; i++
- *p++ = color;
-
- Is this the best code? I use ThinkC 5.0.
-
- Would appreciate all kinds of answer on this newsgroup.
- My little code segment is for a FTQD (Faster Than QuickDraw) 8-bit line
- drawing routine, and also a convex polygon fill routine.
-
- Thanks in advance.
-
- Tai Hou
- Singapore
-
-
-
-
- - -------------------------
-
- From: CXT105@psuvm.psu.edu (Christopher Tate)
- Subject: Fatest code to fill memory?
- Date: 10 Feb 92 21:32:15 GMT
- Organization: Penn State University
-
- In article <1992Feb10.154346.23488@nuscc.nus.sg>, taihou@iss.nus.sg (Tng Tai
- Hou) says:
- >
- >Can anyone recommend sample 'c' or 680x0 assembly code to fill
- >contiguous memory with some value? For example:
- >
- >int i;
- >Ptr p;
- >
- >p = baseAddr;
- >for (i=0; i<100; i++)
- > *p++ = color;
-
- Going with straight assembly is your best option for a highly-optimized
- application like fast graphics. Try something like:
-
- asm {
- movea baseAddr, a0 /* equivalent to Ptr p above */
- move 100, d0 /* loop counter */
- move.l 0x1C1C1C1C,d1 /* assuming 8-bit pixels set to hex 1C */
- @1: move.l d1,(a0)+ /* write 4 pixels */
- dbeq d0, @1 /* decrement d0, branch to @1 if non-zero */
- }
-
- This code is really icky, but the important part is that it writes 4
- pixels per access (assuming 8-bit pixels), and uses the DBEQ instruction
- for speed. Caches will love that tight loop.
-
- Caveats are that it also assumes that you're moving a multiple of 4
- pixels -- if you aren't, then you'll have to adjust the limits of the
- loop accordingly, and special-case the last three or fewer pixels.
- Also, it assumes that the base address is word aligned. If it isn't
- (e.g. there's an extra pixel at the beginning of a scan line), you'll
- have to special case that, too.
-
- Setting D1 to be the proper value without being able to assume that the
- value is a constant is a little more complex, but not really hard. It's
- left as an exercise to the reader. :-)
-
- - -----
- Christopher Tate | Cryptogram #7:
- cxt105@psuvm.psu.edu |
- CXT105@PSUVM.BITNET | Z XZG AYRPOVR LVTLPYTW
- - -------------------------------| YL IYQW TYDPR.
- Send me the answer; I love mail! |
-
-
-
- - -------------------------
-
- From: jesjones@milton.u.washington.edu (Jesse Jones)
- Subject: Fatest code to fill memory?
- Organization: University of Washington, Seattle
- Date: Tue, 11 Feb 1992 00:36:54 GMT
-
-
- Chris Tate is right: assembly code is definitely the best way to go
- if you want your graphic routines to run as fast as possible. The code
- he has is fine as long as you remember that the decrement and branch
- instructions are restricted to word length counter registers.
-
- I wrote an assembly routine a while back that fills an arbitrary
- amount of memory with words. If you're stuffing longwords you can
- speed this up some. The routine is in Modula-2 and in the form of
- an inline procedure. The machine language was generated using a DA
- called Quik Hex (which I highly recommend).
-
- PROCEDURE FillMem (adr, bytes: ADDRESS; filler: WORD);
- INLINE2(321FH), (* MOVE.W (A7)+, D1 *)
- INLINE2(241FH), (* MOVE.L (A7)+, D2 *)
- INLINE2(201FH), (* MOVE.L (A7)+, D0 *)
- INLINE2(0A055H), (* StripAddress *)
- INLINE2(2040H), (* MOVE.L D0, A0 *)
- INLINE2(30C1H), (* MOVE.W D1, (A0)+ *)
- INLINE2(5582H), (* SUBQ.L #2, D2 *)
- INLINE2(6EFAH); (* BGT.S -4 *)
-
-
- --Jesse
-
-
-
-
- - -------------------------
-
- From: neeri@iis.ethz.ch (Matthias Ulrich Neeracher)
- Subject: Fatest code to fill memory?
- Date: 11 Feb 92 16:43:44 GMT
- Organization: Integrated Systems Laboratory, ETH, Zurich
-
- In article <92041.163215CXT105@psuvm.psu.edu> Christopher Tate <CXT105@psuvm.psu.edu> writes:
- >In article <1992Feb10.154346.23488@nuscc.nus.sg>, taihou@iss.nus.sg (Tng Tai
- >Hou) says:
- >>
- >>Can anyone recommend sample 'c' or 680x0 assembly code to fill
- >>contiguous memory with some value? For example:
- >>
- >>int i;
- >>Ptr p;
- >>
- >>p = baseAddr;
- >>for (i=0; i<100; i++)
- >> *p++ = color;
- >
- >Going with straight assembly is your best option for a highly-optimized
- >application like fast graphics. Try something like:
- >
- >asm {
- > movea baseAddr, a0 /* equivalent to Ptr p above */
- > move 100, d0 /* loop counter */
- > move.l 0x1C1C1C1C,d1 /* assuming 8-bit pixels set to hex 1C */
- >@1: move.l d1,(a0)+ /* write 4 pixels */
- > dbeq d0, @1 /* decrement d0, branch to @1 if non-zero */
- >}
- >
- >This code is really icky, but the important part is that it writes 4
- >pixels per access (assuming 8-bit pixels), and uses the DBEQ instruction
- >for speed. Caches will love that tight loop.
-
- But you probably can do better than that by unrolling the inner loop by a
- factor of 8, resulting in:
-
- @1 MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- DBEQ D0, @1
-
- >Caveats are that it also assumes that you're moving a multiple of 4
- >pixels -- if you aren't, then you'll have to adjust the limits of the
- >loop accordingly, and special-case the last three or fewer pixels.
- >Also, it assumes that the base address is word aligned. If it isn't
- >(e.g. there's an extra pixel at the beginning of a scan line), you'll
- >have to special case that, too.
-
- My above code is even more tricky in this respect, and you have to ask yourself
- whether the work is justifed.
-
- For an interesting study in loop unrolling, take a look at Apple's
- implementation of _BlockMove.
-
- Matthias
-
- - ---
- Matthias Neeracher neeri@iis.ethz.ch
- `We say "gestalt" when things combine to act in ways we can't explain'
- -- Marvin Minsky, _The Society Of Mind_
-
-
-
- - -------------------------
-
- From: Michael_Hecht@mac.sas.com (Michael Hecht)
- Subject: Fatest code to fill memory?
- Date: 11 Feb 92 16:32:15 GMT
- Organization: SAS Institute Inc.
-
- In article <NEERI.92Feb11104344@iis.ethz.ch>,
- neeri@iis.ethz.ch (Matthias Ulrich Neeracher) writes:
- >
- > In article <92041.163215CXT105@psuvm.psu.edu>
- > Christopher Tate <CXT105@psuvm.psu.edu> writes:
- > >
- > >In article <1992Feb10.154346.23488@nuscc.nus.sg>,
- > > taihou@iss.nus.sg (Tng Tai Hou) says:
- > >>
- > >>Can anyone recommend sample 'c' or 680x0 assembly code to fill
- > >>contiguous memory with some value? For example:
- > >>
- > >>[simple byte fill loop example deleted]
- > >
- > >Going with straight assembly is your best option for a highly-optimized
- > >application like fast graphics. Try something like:
- > >
- > >[longword assembler loop example deleted]
- > >
- > >[...] it writes 4
- > >pixels per access (assuming 8-bit pixels), and uses the DBEQ instruction
- > >for speed. Caches will love that tight loop.
- >
- > But you probably can do better than that by unrolling the inner loop by a
- > factor of 8, resulting in:
- >
- >[unrolled assembler loop example deleted]
- >
- > >Caveats are that it also assumes that you're moving a multiple of 4
- > >pixels. [...] Also, it assumes that the base address is word aligned. [...]
- >
- > My above code is even more tricky in this respect, and you have to ask yourself
- > whether the work is justifed.
-
-
- Here's some code I wrote a while back. It "unrolls" the loop by using the
- movem instruction to fill memory in 28-byte chunks. This code works in
- THINK C 4; I haven't yet checked that it also works with THINK C 5. However,
- the THINK C 5 manual states that register assignment is disabled for any
- function containing asm statements.
-
- Note that all the caveats mentioned above are handled here.
-
- --Michael
-
- =======================================================================
- Michael P. Hecht | Internet: Michael_Hecht@mac.sas.com
- SAS Institute Inc.; Cary, NC USA | AppleLink: SAS.HECHT
- =======================================================================
-
- /* Number of registers we can use */
- #define NREGS 7
- /* Number of bytes we can fill in one chunk */
- #define AMOUNT NREGS*sizeof(long)
-
- /* Fill memory starting at p for len bytes with c */
- void FillMem( char *p, short len, char c )
- {
- /* Use one address register for q */
- register char * q;
-
- /* Use five data registers... */
- register long r1, r2, r3, r4, r5;
- /* ...and remaining two address registers for filling */
- register char *r6, *r7;
-
-
- /* Sanity check for len */
- if( len <= 0 )
- return;
-
- /* Replicate character to all four bytes of r1 */
- r1 = c & 0xFF;
- r1 |= r1 << 8;
- r1 |= r1 << 16;
-
- /* Fill all registers with fill character */
- r2 = r3 = r4 = r5 = r1;
- r6 = r7 = ( char * )r1;
-
- /* Align p on a long-word address */
- while(( long )p & 0x00000003 ) {
- *p++ = r1;
- if( !( --len ))
- break;
- }
-
- /*
- * Fill as many full chunks as possible.
- *
- * We have to use the predecrement mode, because the
- * 680x0 doesn't allow movem'ing registers to memory
- * in postincrement mode (that's only allowed in the
- * opposite direction).
- */
- q = p + AMOUNT;
- p += len;
- for( ; q < p; q += 2*AMOUNT )
- asm {
- movem.l r1/r2/r3/r4/r5/r6/r7,-(q)
- }
-
- /* Fill any leftover partial chunk, a byte at a time */
- q -= AMOUNT;
- for( ; q < p; )
- asm {
- move.b r1,(q)+
- }
- }
-
-
-
- - -------------------------
-
- From: suitti@ima.isc.com (Stephen Uitti)
- Subject: Fatest code to fill memory?
- Organization: Interactive Systems, Cambridge, MA 02138-5302
- Date: Tue, 11 Feb 1992 18:06:44 GMT
-
- In article <92041.163215CXT105@psuvm.psu.edu> Christopher Tate <CXT105@psuvm.psu.edu> writes:
- >In article <1992Feb10.154346.23488@nuscc.nus.sg>, taihou@iss.nus.sg (Tng Tai
- >Hou) says:
- >>Can anyone recommend sample 'c' or 680x0 assembly code to fill
- >>contiguous memory with some value? For example:
- >>
- >>int i;
- >>Ptr p;
- >>
- >>p = baseAddr;
- >>for (i=0; i<100; i++)
- >> *p++ = color;
- >
- >Going with straight assembly is your best option for a highly-optimized
- >application like fast graphics. Try something like:
- >
- >asm {
- > movea baseAddr, a0 /* equivalent to Ptr p above */
- > move 100, d0 /* loop counter */
- > move.l 0x1C1C1C1C,d1 /* assuming 8-bit pixels set to hex 1C */
- >@1: move.l d1,(a0)+ /* write 4 pixels */
- > dbeq d0, @1 /* decrement d0, branch to @1 if non-zero */
- >}
-
- Note: dbeq checks the condition codes first. You need to jump
- over the move at @1: for the first loop. Also, the code is not
- good for clearing to zeros, since the condition code will be set
- to exit the loop on the first pass. See code for "amemset", below.
- When it all comes down, I couldn't get dbeq to work right.
-
- I did this in Think C 5.02, on a Mac IIci with cache board. In
- the options: Code Optimization I had "Honor 'register' first"
- were set. Note: all of these routines expect "count" to mean
- "this many long words", not "this many bytes".
-
- /* This is the code that I'd use first */
- /* Assume long aligned "buf" */
- void cmemset(register long *buf, register long value,
- register unsigned long count)
- {
- do {
- *buf++ = value;
- } while (--count);
- }
-
- /* This compiles to the same as the above code */
- void bmemset(register long *buf, register long value,
- register unsigned long count)
- {
- asm {
- @1: move.l value,(buf)+ /* do { *buf++ = value; */
- subq.l #1, count /* } while (--count); */
- bne.s @1
- }
- }
-
- /* This is how to use "dbeq" correctly.
- * Note that it will exit early (after the first move.l
- * if "value" is zero. This is, of course, wrong.
- */
- void amemset(register long *buf, register long value,
- register unsigned long count)
- {
- asm {
- bra.s @2 /* dbeq checks first */
- @1: move.l value,(buf)+ /* do { *buf++ = value; */
- @2: dbeq count, @1 /* } while (--count); */
- }
- }
-
- Note that the thinkC "memset" ANSI routine uses one move.b per
- loop. It uses "asm", but it would have been the same code in C.
-
- For my money, I'd let the C compiler do the linkage work when doing
- assembly. Let it figure out where to put variables. Let it
- do the C calling sequence. Just check the work it did with
- the Disassemble command.
-
- Finally, let's examine Duff's device for loop unrolling.
-
- void dmemset(register long *buf, register long value,
- register unsigned long count)
- {
- register unsigned long loop;
-
- loop = (count + 8 - 1) >> 3;
- switch (count & (8 - 1)) {
- case 0:
- do {
- *buf++ = value;
- case 7:
- *buf++ = value;
- case 6:
- *buf++ = value;
- case 5:
- *buf++ = value;
- case 4:
- *buf++ = value;
- case 3:
- *buf++ = value;
- case 2:
- *buf++ = value;
- case 1:
- *buf++ = value;
- } while (--loop);
- }
- }
-
- This compiles a nicely unrolled loop that will handle any number
- of longwords. The switch gets you into the loop at the right
- spot for the first loop, and then the loop fills 8 longwords at a
- time until it is done. The loop overhead is 1/8th as much.
- Duff's device looks gross, but remember that a switch is really
- just a computed "goto", and the "case"'s are just labels. I
- believe this hack is blessed by ANSI C.
-
- Think C uses "subq.l"/"bne.s", rather than "dbeq", so if you
- wanted that too, you'd have to code it in assembly. Though
- "Disassemble" can help here, it is further complicated by the
- fact that "dbeq" checks the condition codes first. "dbeq" is an
- instruction that "check condition codes for zero, if nonzero
- subtract one, if not -1 then branch". The instruction I really
- wanted was the PDP-11 "sob" - "subtract one and branch if not
- zero". In fact, given that "dbeq" is so complicated, I wouldn't
- be surprised if "subq.l"/"bne.s" weren't faster, or at least
- nearly the same speed as "dbeq". I'd say, off hand, that "dbeq"
- is useless.
-
- A short benchmark of the above routines called with
- a buffer 100,000 4 byte words long, a value of 0x40404040, and
- a count of 100,000, and each routine is called 1,000 times,
- the times on a Mac IIci with cache card were:
-
- Routine Time in seconds
- amemset 22 (dbeq) - however, it didn't seem to work right.
- bmemset 78 (subq.l/bne.s but in asm)
- cmemset 78 (subq.l/bne.s but in C)
- dmemset 78 (unrolled 8 times with subq.l/bne.s at bottom)
-
- dmemset not was faster than bmemset or cmemset. It was not the
- speedup I'd have expected. Loop unrolling does not appear to
- benefit this code.
-
- On a new topic:
-
- One thing I'd wish for in Think C is that "Disassemble" was
- completely compatible with "asm". For example, "Disassemble"
- produces code such as
-
- move.l #$01020304,D0
-
- whereas "asm" wants to see
-
- move.l #01020304,d0
-
- This is just a pain.
-
- Stephen.
- suitti@ima.isc.com
-
-
-
- - -------------------------
-
- From: d88-jwa@hemul.nada.kth.se (Jon W{tte)
- Subject: Fatest code to fill memory?
- Date: 11 Feb 92 17:28:42 GMT
- Organization: Royal Institute of Technology, Stockholm, Sweden
-
- .ch> neeri@iis.ethz.ch (Matthias Ulrich Neeracher) writes:
-
- >>Can anyone recommend sample 'c' or 680x0 assembly code to fill
- >>contiguous memory with some value? For example:
-
- >>p = baseAddr;
- >>for (i=0; i<100; i++)
- >> *p++ = color;
-
- >@1: move.l d1,(a0)+ /* write 4 pixels */
- > dbeq d0, @1 /* decrement d0, branch to @1 if non-zero */
-
- @1 MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- MOVE.L D1, (A0)+
- DBEQ D0, @1
-
- What's wrong with MOVEM.L ? That should copy 60 bytes per instruction
- fetch...
-
- For an interesting study in loop unrolling, take a look at Apple's
- implementation of _BlockMove.
-
- True. Especially on 040 ROMs where they move a cache line each time :-)
- (Couldn't youjust get the address of BlockMove and call that directly ?
- That might be fast enough !)
-
-
- --
- This Signature is distributed under the conditions of the Signature License,
- available at a fee from h+@nada.kth.se (Jon W{tte) Reading the Signature
- implies that you accept to be bound by the terms in said License. Should you
- not agree on any of these terms, you must return the Signature unread to me.
-
-
-
- - -------------------------
-
- From: orpheus@reed.edu (P. Hawthorne)
- Subject: Fatest code to fill memory?
- Date: 12 Feb 92 05:53:24 GMT
- Organization: Reed College, Portland OR
-
-
- I spent too long trying to best BlockMove. I tried just about
- everything I could think of. BlockMove is weak from tight inner loops and
- short moves. For pure flat out speed on short moves from an inner loop,
- one can do better.
-
- If you never have overlapping source and destination blocks, it can
- be made even faster. Perhaps not surprisingly, it is faster in general to
- use a predecrement mode move if possible. Since the only thing that makes
- BlockMove weak is that it eats cache, one's own routines should not.
- MOVEM is great for the 68000, but the overhead of saving and restoring
- registers outweighs the reduced cache damage for tight loops, where we
- have a chance of improving on BlockMove. I was aiming at best performance
- on 68020/68030 machines, but I suspect that the MOVEM routine included
- would compare decently with BlockMove on 68040 machines.
-
- Here're some of the most successful of the routines I was pitting
- against BlockMove. Judge me not by their elegance or lack thereof. These
- reflect the experimentation process I was undergoing, and not everything
- I learned. I very much look forward to any commentary, however, first
- machine code project and all... Good, bad, attrocious?
-
- I'll defer the explanation of them until someone specifically asks,
- since I expect folks who are interested to be able to read this with
- little or no effort. Formal source will be included with the Panacea
- Class Library, which I shall be releasing Real Soon Now.
-
- Incidentally, the fellow who loaned me the m68k series manuals was
- utterly nonplussed at the idea of a fast memory copy. He thought it was
- too simple, and offered me a tight printf and a multiprocessor FFT.
-
- The general purpose mover I ultimately wrote will first decide if
- content overlaps and go to postincrement or predecrement cases
- appropriately. Those check to see if the number of characters is small,
- medium or large. If large, 4096 chars, then it calls BlockMove. It does
- not jump straight to the address of BlockMove, which it ought, I know.
-
- I include the Pascal INLINE for that in the next article. I'm afraid I
- lost my ResEdit copy. Oh, one more thing. Please pardon the syntax, these
- are hand assembled from machine code within ResEdit's hex editor. (Just
- don't happen to own MPW Asm)
-
- Theus (orpheus@reed.edu)
-
- Machine code source starts here. Pascal INLINEs in next article:
-
- memcpy 1 byte 0 alignment (note: byte count means bytes before looping)
- +0000 000000 MOVE.L (A7)+,D0 | 201F
- +0002 000002 MOVEA.L (A7)+,A1 | 225F
- +0004 000004 MOVEA.L (A7)+,A0 | 205F
- +0006 000006 SUBQ.L #$1,D0 | 5380
- +0008 000008 BMI.S memcpy+$0018 | 6B0E
- +000A 00000A MOVE.B (A0)+,(A1)+ | 12D8
- +000C 00000C DBF D0,memcpy+$000A | 51C8 FFFC
- +0010 000010 SUBI.L #$00010000,D0 | 0480 0001 0000
- +0016 000016 BGT.S memcpy+$000A | 6EF2
-
- memcpy 2 byte 0 alignment
- +0000 000000 MOVE.L (A7)+,D0 | 201F
- +0002 000002 MOVEA.L (A7)+,A1 | 225F
- +0004 000004 MOVEA.L (A7)+,A0 | 205F
- +0006 000006 LSR.L #$1,D0 | E288
- +0008 000008 BCC.S memcpy+$000E | 6404
- +000A 00000A MOVE.B (A0)+,(A1)+ | 12D8
- +000C 00000C TST.L D0 | 4A80
- +000E 00000E BEQ.S memcpy+$0020 | 6710
- +0010 000010 SUBQ.L #$1,D0 | 5380
- +0012 000012 MOVE.W (A0)+,(A1)+ | 32D8
- +0014 000014 DBF D0,memcpy+$0012 | 51C8 FFFC
- +0018 000018 SUBI.L #$00010000,D0 | 0480 0001 0000
- +001E 00001E BGT.S memcpy+$0012 | 6EF2
-
- memcpy 8 byte 0 alignment
- +0000 000000 MOVE.L (A7)+,D0 | 201F
- +0002 000002 MOVEA.L (A7)+,A1 | 225F
- +0004 000004 MOVEA.L (A7)+,A0 | 205F
- +0006 000006 LSR.L #$1,D0 | E288
- +0008 000008 BCC.S memcpy+$000E | 6404
- +000A 00000A MOVE.B (A0)+,(A1)+ | 12D8
- +000C 00000C TST.L D0 | 4A80
- +000E 00000E BEQ.S memcpy+$0036 | 6726
- +0010 000010 LSR.L #$1,D0 | E288
- +0012 000012 BCC.S memcpy+$001A | 6406
- +0014 000014 MOVE.W (A0)+,(A1)+ | 32D8
- +0016 000016 TST.L D0 | 4A80
- +0018 000018 BEQ.S memcpy+$0036 | 671C
- +001A 00001A LSR.L #$1,D0 | E288
- +001C 00001C BCC.S memcpy+$0024 | 6406
- +001E 00001E MOVE.L (A0)+,(A1)+ | 22D8
- +0020 000020 TST.L D0 | 4A80
- +0022 000022 BEQ.S memcpy+$0036 | 6712
- +0024 000024 SUBQ.L #$1,D0 | 5380
- +0026 000026 MOVE.L (A0)+,(A1)+ | 22D8
- +0028 000028 MOVE.L (A0)+,(A1)+ | 22D8
- +002A 00002A DBF D0,memcpy+$0026 | 51C8 FFFA
- +002E 00002E SUBI.L #$00010000,D0 | 0480 0001 0000
- +0034 000034 BGT.S memcpy+$0026 | 6EF0
-
- memcpy 32 byte 0 alignment
- +0000 000000 MOVE.L (A7)+,D0 | 201F
- +0002 000002 MOVEA.L (A7)+,A1 | 225F
- +0004 000004 MOVEA.L (A7)+,A0 | 205F
- +0006 000006 LSR.L #$1,D0 | E288
- +0008 000008 BCC.S memcpy+$000E | 6404
- +000A 00000A MOVE.B (A0)+,(A1)+ | 12D8
- +000C 00000C TST.L D0 | 4A80
- +000E 00000E BEQ.S memcpy+$005E | 674E
- +0010 000010 LSR.L #$1,D0 | E288
- +0012 000012 BCC.S memcpy+$001A | 6406
- +0014 000014 MOVE.W (A0)+,(A1)+ | 32D8
- +0016 000016 TST.L D0 | 4A80
- +0018 000018 BEQ.S memcpy+$005E | 6744
- +001A 00001A LSR.L #$1,D0 | E288
- +001C 00001C BCC.S memcpy+$0024 | 6406
- +001E 00001E MOVE.L (A0)+,(A1)+ | 22D8
- +0020 000020 TST.L D0 | 4A80
- +0022 000022 BEQ.S memcpy+$005E | 673A
- +0024 000024 LSR.L #$1,D0 | E288
- +0026 000026 BCC.S memcpy+$0030 | 6408
- +0028 000028 MOVE.L (A0)+,(A1)+ | 22D8
- +002A 00002A MOVE.L (A0)+,(A1)+ | 22D8
- +002C 00002C TST.L D0 | 4A80
- +002E 00002E BEQ.S memcpy+$005E | 672E
- +0030 000030 LSR.L #$1,D0 | E288
- +0032 000032 BCC.S memcpy+$0040 | 640C
- +0034 000034 MOVE.L (A0)+,(A1)+ | 22D8
- +0036 000036 MOVE.L (A0)+,(A1)+ | 22D8
- +0038 000038 MOVE.L (A0)+,(A1)+ | 22D8
- +003A 00003A MOVE.L (A0)+,(A1)+ | 22D8
- +003C 00003C TST.L D0 | 4A80
- +003E 00003E BEQ.S memcpy+$005E | 671E
- +0040 000040 SUBQ.L #$1,D0 | 5380
- +0042 000042 MOVE.L (A0)+,(A1)+ | 22D8
- +0044 000044 MOVE.L (A0)+,(A1)+ | 22D8
- +0046 000046 MOVE.L (A0)+,(A1)+ | 22D8
- +0048 000048 MOVE.L (A0)+,(A1)+ | 22D8
- +004A 00004A MOVE.L (A0)+,(A1)+ | 22D8
- +004C 00004C MOVE.L (A0)+,(A1)+ | 22D8
- +004E 00004E MOVE.L (A0)+,(A1)+ | 22D8
- +0050 000050 MOVE.L (A0)+,(A1)+ | 22D8
- +0052 000052 DBF D0,memcpy+$0042 | 51C8 FFEE
- +0056 000056 SUBI.L #$00010000,D0 | 0480 0001 0000
- +005C 00005C BGT.S memcpy+$0042 | 6EE4
-
- memcpy 256 byte 0 alignment
- +0000 000000 MOVE.L (A7)+,D0 | 201F
- +0002 000002 MOVEA.L (A7)+,A1 | 225F
- +0004 000004 MOVEA.L (A7)+,A0 | 205F
- +0006 000006 LSR.L #$1,D0 | E288
- +0008 000008 BCC.S memcpy+$000E | 6404
- +000A 00000A MOVE.B (A0)+,(A1)+ | 12D8
- +000C 00000C TST.L D0 | 4A80
- +000E 00000E BEQ memcpy+$0118 | 6700 0108
- +0012 000012 LSR.L #$1,D0 | E288
- +0014 000014 BCC.S memcpy+$001E | 6408
- +0016 000016 MOVE.W (A0)+,(A1)+ | 32D8
- +0018 000018 TST.L D0 | 4A80
- +001A 00001A BEQ memcpy+$0118 | 6700 00FC
- +001E 00001E LSR.L #$1,D0 | E288
- +0020 000020 BCC.S memcpy+$002A | 6408
- +0022 000022 MOVE.L (A0)+,(A1)+ | 22D8
- +0024 000024 TST.L D0 | 4A80
- +0026 000026 BEQ memcpy+$0118 | 6700 00F0
- +002A 00002A LSR.L #$1,D0 | E288
- +002C 00002C BCC.S memcpy+$0038 | 640A
- +002E 00002E MOVE.L (A0)+,(A1)+ | 22D8
- +0030 000030 MOVE.L (A0)+,(A1)+ | 22D8
- +0032 000032 TST.L D0 | 4A80
- +0034 000034 BEQ memcpy+$0118 | 6700 00E2
- +0038 000038 LSR.L #$1,D0 | E288
- +003A 00003A BCC.S memcpy+$004A | 640E
- +003C 00003C MOVE.L (A0)+,(A1)+ | 22D8
- +003E 00003E MOVE.L (A0)+,(A1)+ | 22D8
- +0040 000040 MOVE.L (A0)+,(A1)+ | 22D8
- +0042 000042 MOVE.L (A0)+,(A1)+ | 22D8
- +0044 000044 TST.L D0 | 4A80
- +0046 000046 BEQ memcpy+$0118 | 6700 00D0
- +004A 00004A LSR.L #$1,D0 | E288
- +004C 00004C BCC.S memcpy+$0064 | 6416
- +004E 00004E MOVE.L (A0)+,(A1)+ | 22D8
- +0050 000050 MOVE.L (A0)+,(A1)+ | 22D8
- +0052 000052 MOVE.L (A0)+,(A1)+ | 22D8
- +0054 000054 MOVE.L (A0)+,(A1)+ | 22D8
- +0056 000056 MOVE.L (A0)+,(A1)+ | 22D8
- +0058 000058 MOVE.L (A0)+,(A1)+ | 22D8
- +005A 00005A MOVE.L (A0)+,(A1)+ | 22D8
- +005C 00005C MOVE.L (A0)+,(A1)+ | 22D8
- +005E 00005E TST.L D0 | 4A80
- +0060 000060 BEQ memcpy+$0118 | 6700 00B6
- +0064 000064 LSR.L #$1,D0 | E288
- +0066 000066 BCC.S memcpy+$008E | 6426
- +0068 000068 MOVE.L (A0)+,(A1)+ | 22D8
- +006A 00006A MOVE.L (A0)+,(A1)+ | 22D8
- +006C 00006C MOVE.L (A0)+,(A1)+ | 22D8
- +006E 00006E MOVE.L (A0)+,(A1)+ | 22D8
- +0070 000070 MOVE.L (A0)+,(A1)+ | 22D8
- +0072 000072 MOVE.L (A0)+,(A1)+ | 22D8
- +0074 000074 MOVE.L (A0)+,(A1)+ | 22D8
- +0076 000076 MOVE.L (A0)+,(A1)+ | 22D8
- +0078 000078 MOVE.L (A0)+,(A1)+ | 22D8
- +007A 00007A MOVE.L (A0)+,(A1)+ | 22D8
- +007C 00007C MOVE.L (A0)+,(A1)+ | 22D8
- +007E 00007E MOVE.L (A0)+,(A1)+ | 22D8
- +0080 000080 MOVE.L (A0)+,(A1)+ | 22D8
- +0082 000082 MOVE.L (A0)+,(A1)+ | 22D8
- +0084 000084 MOVE.L (A0)+,(A1)+ | 22D8
- +0086 000086 MOVE.L (A0)+,(A1)+ | 22D8
- +0088 000088 TST.L D0 | 4A80
- +008A 00008A BEQ memcpy+$0118 | 6700 008C
- +008E 00008E MOVEM.L D1-D7/A2-A6,-(A7) | 48E7 7F3E
- +0092 000092 LSR.L #$1,D0 | E288
- +0094 000094 BCC.S memcpy+$00BE | 6428
- +0096 000096 MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +009A 00009A MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +009E 00009E ADDA.W #$0030,A1 | D2FC 0030
- +00A2 0000A2 MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +00A6 0000A6 MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +00AA 0000AA ADDA.W #$0030,A1 | D2FC 0030
- +00AE 0000AE MOVEM.L (A0)+,D2-D7/A4/A5 | 4CD8 30FC
- +00B2 0000B2 MOVEM.L D2-D7/A4/A5,(A1) | 48D1 30FC
- +00B6 0000B6 ADDA.W #$0020,A1 | D2FC 0020
- +00BA 0000BA TST.L D0 | 4A80
- +00BC 0000BC BEQ.S memcpy+$0114 | 6756
- +00BE 0000BE SUBQ.L #$1,D0 | 5380
- +00C0 0000C0 MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +00C4 0000C4 MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +00C8 0000C8 ADDA.W #$0030,A1 | D2FC 0030
- +00CC 0000CC MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +00D0 0000D0 MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +00D4 0000D4 ADDA.W #$0030,A1 | D2FC 0030
- +00D8 0000D8 MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +00DC 0000DC MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +00E0 0000E0 ADDA.W #$0030,A1 | D2FC 0030
- +00E4 0000E4 MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +00E8 0000E8 MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +00EC 0000EC ADDA.W #$0030,A1 | D2FC 0030
- +00F0 0000F0 MOVEM.L (A0)+,D1-D7/A2-A6 | 4CD8 7CFE
- +00F4 0000F4 MOVEM.L D1-D7/A2-A6,(A1) | 48D1 7CFE
- +00F8 0000F8 ADDA.W #$0030,A1 | D2FC 0030
- +00FC 0000FC MOVEM.L (A0)+,D1-D4 | 4CD8 001E
- +0100 000100 MOVEM.L D1-D4,(A1) | 48D1 001E
- +0104 000104 ADDA.W #$0010,A1 | D2FC 0010
- +0108 000108 DBF D0,memcpy+$00C0 | 51C8 FFB6
- +010C 00010C SUBI.L #$00010000,D0 | 0480 0001 0000
- +0112 000112 BGT.S memcpy+$00C0 | 6EAC
- +0114 000114 MOVEM.L (A7)+,D1-D7/A2-A6 | 4CDF 7CFE
-
- End of machine code source. Pascal INLINEs in next article.
-
-
-
- - -------------------------
-
- From: orpheus@reed.edu (P. Hawthorne)
- Subject: Fatest code to fill memory?
- Date: 12 Feb 92 06:20:48 GMT
- Organization: Reed College, Portland OR
-
- Pascal INLINE source starts here.
-
- procedure CopyBlockInline;
- inline {This is the most general mover included. Does overlaps.}
- $226E, $000C, $206E, $0010, $202E, $0008, $2208, $D280, $B289, $5DC1, {}
- $B3C8, $5DC2, $8202, $6700, $00D6, $0C80, $0000, $00C4, $6E3A, $E288, {}
- $6404, $12D8, $4A80, $6700, $019A, $E288, $6408, $32D8, $4A80, $6700, {}
- $018E, $E288, $6408, $22D8, $4A80, $6700, $0182, $5380, $22D8, $22D8, {}
- $51C8, $FFFA, $0480, $0001, $0000, $6EF0, $6000, $016C, $0C80, $0000, {}
- $0D00, $6E00, $0084, $E288, $6406, $12D8, $4A80, $677A, $E288, $6406, {}
- $32D8, $4A80, $6770, $E288, $6406, $22D8, $4A80, $6766, $E288, $6408, {}
- $22D8, $22D8, $4A80, $675A, $E288, $640C, $22D8, $22D8, $22D8, $22D8, {}
- $4A80, $674A, $E288, $6414, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $4A80, $6732, $5380, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $51C8, $FFDE, $0480, $0001, $0000, $6ED4, $6002, $A02E, $6000, {}
- $00DA, $0C80, $0000, $00C4, $6E3E, $D1C0, $D3C0, $E288, $6404, $1320, {}
- $4A80, $6700, $00C2, $E288, $6408, $3320, $4A80, $6700, $00B6, $E288, {}
- $6408, $2320, $4A80, $6700, $00AA, $5380, $2320, $2320, $51C8, $FFFA, {}
- $0480, $0001, $0000, $6EF0, $6000, $0094, $0C80, $0000, $0D00, $6E00, {}
- $0088, $D1C0, $D3C0, $E288, $6406, $1320, $4A80, $677A, $E288, $6406, {}
- $3320, $4A80, $6770, $E288, $6406, $2320, $4A80, $6766, $E288, $6408, {}
- $2320, $2320, $4A80, $675A, $E288, $640C, $2320, $2320, $2320, $2320, {}
- $4A80, $674A, $E288, $6414, $2320, $2320, $2320, $2320, $2320, $2320, {}
- $2320, $2320, $4A80, $6732, $5380, $2320, $2320, $2320, $2320, $2320, {}
- $2320, $2320, $2320, $2320, $2320, $2320, $2320, $2320, $2320, $2320, {}
- $2320, $51C8, $FFDE, $0480, $0001, $0000, $6ED4, $6002, $A02E; {}
-
- procedure CopyBlock (src, dst: univ Ptr; count: Longint);
- begin {Glue routine that assumes Think Pascal. Watch your registers.}
- CopyBlockInline;
- end;
-
- procedure CopyMem (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $0C80, $0000, $00C4, $6E3A, $E288, $6404, $12D8, {}
- $4A80, $6700, $00BE, $E288, $6408, $32D8, $4A80, $6700, $00B2, $E288, {}
- $6408, $22D8, $4A80, $6700, $00A6, $5380, $22D8, $22D8, $51C8, $FFFA, {}
- $0480, $0001, $0000, $6EF0, $6000, $0090, $0C80, $0000, $0D00, $6E00, {}
- $0084, $E288, $6404, $12D8, $4A80, $677A, $E288, $6406, $32D8, $4A80, {}
- $6770, $E288, $6406, $22D8, $4A80, $6766, $E288, $6408, $22D8, $22D8, {}
- $4A80, $675A, $E288, $640C, $22D8, $22D8, $22D8, $22D8, $4A80, $674A, {}
- $E288, $6414, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $4A80, $6732, $5380, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $51C8, {}
- $FFDE, $0480, $0001, $0000, $6ED4, $6002, $A02E; {}
-
- procedure CopyMemBackwards (src, dst: univ Ptr; count: Longint);
- inline {Moves memory using predecrement mode}
- $201F, $225F, $205F, $0C80, $0000, $00C4, $6E3E, $D1C0, $D3C0, $E288, {}
- $6404, $1320, $4A80, $6700, $00C2, $E288, $6408, $3320, $4A80, $6700, {}
- $00B6, $E288, $6408, $2320, $4A80, $6700, $00AA, $5380, $2320, $2320, {}
- $51C8, $FFFA, $0480, $0001, $0000, $6EF0, $6000, $0094, $0C80, $0000, {}
- $0D00, $6E00, $0088, $D1C0, $D3C0, $E288, $6404, $1320, $4A80, $677A, {}
- $E288, $6406, $3320, $4A80, $6770, $E288, $6406, $2320, $4A80, $6766, {}
- $E288, $6408, $2320, $2320, $4A80, $675A, $E288, $640C, $2320, $2320, {}
- $2320, $2320, $4A80, $674A, $E288, $6414, $2320, $2320, $2320, $2320, {}
- $2320, $2320, $2320, $2320, $4A80, $6732, $5380, $2320, $2320, $2320, {}
- $2320, $2320, $2320, $2320, $2320, $2320, $2320, $2320, $2320, $2320, {}
- $2320, $2320, $2320, $51C8, $FFDE, $0480, $0001, $0000, $6ED4, $6002, {}
- $A02E; {}
-
- procedure Copy1 (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6404, $12D8, $4A80, $6710, $5380, $32D8, {}
- $51C8, $FFFC, $0480, $0001, $0000, $6EF2; {}
-
- procedure Copy2 (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6406, $12D8, $4A80, $671A, $E288, $6404, {}
- $32D8, $4A80, $6710, $5380, $22D8, $51C8, $FFFC, $0480, $0001, $0000, {}
- $6EF2; {}
-
- procedure Copy8 (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6404, $12D8, $4A80, $6736, $E288, $6406, {}
- $32D8, $4A80, $672C, $E288, $6406, $22D8, $4A80, $6722, $E288, $6408, {}
- $22D8, $22D8, $4A80, $6716, $5380, $22D8, $22D8, $22D8, $22D8, $51C8, {}
- $FFF6, $0480, $0001, $0000, $6EEC; {}
-
- procedure Copy32 (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6404, $12D8, $4A80, $6776, $E288, $6406, {}
- $32D8, $4A80, $676C, $E288, $6406, $22D8, $4A80, $6762, $E288, $6408, {}
- $22D8, $22D8, $4A80, $6756, $E288, $640C, $22D8, $22D8, $22D8, $22D8, {}
- $4A80, $6746, $E288, $6414, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $4A80, $672E, $5380, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $51C8, $FFDE, $0480, $0001, $0000, $6ED4; {}
-
- procedure Copy64 (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6404, $12D8, $4A80, $6700, $00B2, $E288, {}
- $6408, $32D8, $4A80, $6700, $00A6, $E288, $6408, $22D8, $4A80, $6700, {}
- $009A, $E288, $640A, $22D8, $22D8, $4A80, $6700, $008C, $E288, $640C, {}
- $22D8, $22D8, $22D8, $22D8, $4A80, $677A, $E288, $6414, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $4A80, $6762, $E288, $6424, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $4A80, $673A, $48E7, $7F3E, {}
- $5380, $4CD8, $7CFE, $48D1, $7CFE, $D2FC, $0030, $4CD8, $7CFE, $48D1, {}
- $7CFE, $D2FC, $0030, $4CD8, $30FC, $48D1, $30FC, $D2FC, $0020, $51C8, {}
- $FFDA, $0480, $0001, $0000, $6ED0, $4CDF, $7CFE; {}
-
- procedure Copy128 (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6404, $12D8, $4A80, $6700, $0108, $E288, {}
- $6408, $32D8, $4A80, $6700, $00FC, $E288, $6408, $22D8, $4A80, $6700, {}
- $00F0, $E288, $640A, $22D8, $22D8, $4A80, $6700, $00E2, $E288, $640E, {}
- $22D8, $22D8, $22D8, $22D8, $4A80, $6700, $00D0, $E288, $6416, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $4A80, $6700, $00B6, {}
- $E288, $6426, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $4A80, $6700, {}
- $008C, $48E7, $7F3E, $E288, $6428, $4CD8, $7CFE, $48D1, $7CFE, $D2FC, {}
- $0030, $4CD8, $7CFE, $48D1, $7CFE, $D2FC, $0030, $4CD8, $30FC, $48D1, {}
- $30FC, $D2FC, $0020, $4A80, $6756, $5380, $4CD8, $7CFE, $48D1, $7CFE, {}
- $D2FC, $0030, $4CD8, $7CFE, $48D1, $7CFE, $D2FC, $0030, $4CD8, $7CFE, {}
- $48D1, $7CFE, $D2FC, $0030, $4CD8, $7CFE, $48D1, $7CFE, $D2FC, $0030, {}
- $4CD8, $7CFE, $48D1, $7CFE, $D2FC, $0030, $4CD8, $001E, $48D1, $001E, {}
- $D2FC, $0010, $51C8, $FFB6, $0480, $0001, $0000, $6EAC, $4CDF, $7CFE; {}
-
- procedure Copy128CacheFriendly (src, dst: univ Ptr; count: Longint);
- inline
- $201F, $225F, $205F, $E288, $6404, $12D8, $4A80, $6700, $00AA, $E288, {}
- $6408, $32D8, $4A80, $6700, $009E, $E288, $6408, $22D8, $4A80, $6700, {}
- $0092, $E288, $640A, $22D8, $22D8, $4A80, $6700, $0084, $E288, $640C, {}
- $22D8, $22D8, $22D8, $22D8, $4A80, $6772, $E288, $6414, $22D8, $22D8, {}
- $22D8, $22D8, $22D8, $22D8, $22D8, $22D8, $4A80, $675A, $48E7, $7F3E, {}
- $E288, $641C, $4CD8, $0CFC, $48D1, $0CFC, $D2FC, $0020, $4CD8, $0CFC, {}
- $48D1, $0CFC, $D2FC, $0020, $4A80, $6732, $5380, $4CD8, $7CFE, $48D1, {}
- $7CFE, $D2FC, $0030, $4CD8, $7CFE, $48D1, $7CFE, $D2FC, $0030, $4CD8, {}
- $30FC, $48D1, $30FC, $D2FC, $0020, $51C8, $FFDA, $0480, $0001, $0000, {}
- $6ED0, $4CDF, $7CFE; {}
-
- End of Pascal INLINE source.
-
-
-
- - -------------------------
-
- From: lim@iris.ucdavis.edu (Lloyd Lim)
- Subject: BlockMove (was Re: Fatest code to fill memory?)
- Date: 12 Feb 92 10:23:01 GMT
- Organization: U.C. Davis - Department of Electrical Engineering and Computer Science
-
- In article <D88-JWA.92Feb11182842@hemul.nada.kth.se> d88-jwa@hemul.nada.kth.se (Jon W{tte) writes:
- >.ch> neeri@iis.ethz.ch (Matthias Ulrich Neeracher) writes:
- >
- > For an interesting study in loop unrolling, take a look at Apple's
- > implementation of _BlockMove.
- >
- >True. Especially on 040 ROMs where they move a cache line each time :-)
- >(Couldn't youjust get the address of BlockMove and call that directly ?
- >That might be fast enough !)
-
- The original post was about filling instead of moving but since we're
- off the subject... :-)
-
- TN 261 says that BlockMove invalidates the cache for sizes larger than
- 12 bytes because you could be moving code. I haven't seen anyone
- mention this here. Does it only invalidate addresses in the destination
- or does it invalidate the whole thing?
-
- I'd think that if it trashes the whole thing and you're just moving data
- (which is probably 99.99% of the time, or 100% if you are well-behaved),
- it'd be faster to call your own routine to move it. Even simple routines
- would probably beat a complete loss of the cache.
-
- +++
- Lloyd Lim Internet: lim@iris.cs.ucdavis.edu
- America Online: LimUnltd
- Compuserve: 72647,660
- US Mail: 224 Lysle Leach Hall, U.C. Davis, Davis, CA 95616
-
-
-
- - -------------------------
-
- From: ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University)
- Subject: Fastest code to fill memory?
- Date: 12 Feb 92 10:30:23 +1300
- Organization: University of Waikato, Hamilton, New Zealand
-
- (Reading comments from people about the awkwardness of using DBEQ in a fast
- loop...)
-
- Pardon me if I'm pointing out something obvious, but you people *do* realize
- there's a version of the DBcc instruction family that _doesn't_ test the
- condition codes, don't you?
-
- Lawrence D'Oliveiro fone: +64-7-856-2889
- Computer Services Dept fax: +64-7-838-4066
- University of Waikato electric mail: ldo@waikato.ac.nz
- Hamilton, New Zealand 37^ 47' 26" S, 175^ 19' 7" E, GMT+13:00
-
-
-
- - -------------------------
-
- From: taihou@iss.nus.sg (Tng Tai Hou)
- Subject: Fast memory fill results!
- Organization: Institute of Systems Science, NUS, Singapore
- Date: Wed, 12 Feb 1992 11:12:00 GMT
-
- I asked for help recently on the subject. I received more than 20 replies.
- Thanks to you folks, I have written two versions of what I think is
- the fastest MemSet yet!!! Or at least, the fatest with my current
- knowledge. I believe I have not taken full advantage of the cache in
- the 68020, 030 and 040, or faster opcodes. Maybe someone can enlighten me.
-
- /*
- This is the version is 'C'. It first computes the remainder of count%4.
- If zer0, performs longword memory fills. Else, it computes
- the nearest longword boundary, fills that, and then fills the remainder
- (1, 2, or 3) in bytes. Note that value is a longword, and each of its
- 4 bytes contain the value (in my case an 8-bit color value)
- */
- void
- MyAMemSet (register unsigned long *buf, register long value, register
- unsigned long count)
- {
- register long rem = count%4; /* rem = count & 0x00000003 */
- register int i;
- register unsigned char *p, val;
-
- if (rem == 0)
- {
- for (i=0; i<count; i+=4)
- *buf++ = value;
- }
- else {
- count &= 0xFFFFFFFC; /* count = count/4*4 */
- for (i=0; i<count; i+=4)
- *buf++ = value;
- val = value & 0x000000FF;
- p = (unsigned char*)buf;
- for (i=0; i<rem; i++)
- *p++ = val;
- }
- }
-
-
- /*
- This is a completely handcrafted version.
- */
- void
- MyAMemSet (/*register unsigned long *buf, register long value, register
- unsigned long count*/)
- {
- asm {
- move.l 4(sp), d0
- movea.l d0, a0 /* a0 = buf */
- move.l 8(sp), d1 /* d1 = value */
- move.l 12(sp), d2 /* d2 = count */
-
- move.l d2, d3 /* d3 = rem */
- and.l #0x00000003, d3
- bne.s @else
-
- @5: move.l d1, (a0)+ /* do *buf++ = value */
- subq.l #4, d2
- bne.s @5
- bra @2
-
- @else: and.l #0xfffffffc, d2
- @1: move.l d1, (a0)+ /* do (*buf++ = value */
- subq.l #4, d2
- bne.s @1
-
- @3: move.b d1, (a0)+
- subq.l #1, d3
- bne.s @3
-
- @2:
- }
- }
-
-
- I appreciate all comments and criticisms. Please post to the newsgroup
- for everyone's benefit. Thanks.
-
- Tai Hou
- Singapore
-
-
-
- - -------------------------
-
- From: ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University)
- Subject: Fast memory fill results!
- Date: 13 Feb 92 10:14:49 +1300
- Organization: University of Waikato, Hamilton, New Zealand
-
- In article <1992Feb12.111200.25672@nuscc.nus.sg>, taihou@iss.nus.sg
- (Tng Tai Hou) offers a handcrafted memset routine, which I think can
- be speeded up a little more.
-
- > /*
- > This is a completely handcrafted version.
- > */
- > void
- > MyAMemSet (/*register unsigned long *buf, register long value, register
- > unsigned long count*/)
- > {
- > asm {
- > move.l 4(sp), d0
- > movea.l d0, a0 /* a0 = buf */
- > move.l 8(sp), d1 /* d1 = value */
- > move.l 12(sp), d2 /* d2 = count */
- >
- > move.l d2, d3 /* d3 = rem */
- > and.l #0x00000003, d3
- > bne.s @else
- >
- > @5: move.l d1, (a0)+ /* do *buf++ = value */
- > subq.l #4, d2
- > bne.s @5
- > bra @2
-
- how about this for a replacement of the sequence from @5:
-
- bra.s @59
- @51: swap d2
- @52: move.l d1, (a0)+
- @59: dbra d2, @52
- swap d2
- dbra d2, @51
- bra @2
- >
- > @else: and.l #0xfffffffc, d2
- > @1: move.l d1, (a0)+ /* do (*buf++ = value */
- > subq.l #4, d2
- > bne.s @1
-
- how about:
-
- @else: lsr.l #2, d2
- bra.s @19
- @11: swap d2
- @12: move.l d1, (a0)+
- @19: dbra d2, @12
- swap d2
- dbra d2, @11
- >
- > @3: move.b d1, (a0)+
- > subq.l #1, d3
- > bne.s @3
-
- This loop will never iterate more than four times, so it's probably
- not worth speeding up.
-
- >
- > @2:
- > }
- > }
-
- Lawrence D'Oliveiro
- One-trick asm pony
-
-
-
- - -------------------------
-
- From: twillis@ec.ecn.purdue.edu (Thomas E Willis)
- Subject: Fatest code to fill memory?
- Organization: Electrical Engineering, Purdue University
- Date: Thu, 13 Feb 1992 15:00:36 GMT
-
- In article <1992Feb11.180644.21941@ima.isc.com> suitti@ima.isc.com (Stephen Uitti) writes:
-
- [tons off stuff deleted]
-
- >zero". In fact, given that "dbeq" is so complicated, I wouldn't
- >be surprised if "subq.l"/"bne.s" weren't faster, or at least
- >nearly the same speed as "dbeq". I'd say, off hand, that "dbeq"
- >is useless.
-
- if i remember my 68k times right, db<cc> is faster than a subq/bne pair.
- to get around the testing for equal problem, use "dbf" (which always takes
- the branch unless the register decrements to -1). you do have to "bias"
- your loop counter since you're looping until the index register hits -1.
- for example:
-
- moveq #9,d0 ; causes us to do the move thing 10 times
- @1: move.l d1, (a0)+
- dbf d0, @1
-
- this should also work if d1 happens to be 0 (which would cause the dbeq
- code to fall through on the first iteration).
-
- dbeq is useful in other situations when you want to get out of the loop on
- equal (moving bytes until the first zero?), but it has some "features" that
- make it not quite right for blasting bits around. kinda the old "right tool
- for the job" type question.
-
- just my $0.02 worth...
- --
- - t
- - -------------------------------------------------------------------------
- Tom Willis / "These are dangerous days, to say what you feel
- Purdue Electrical Engr. / is to dig your own grave." - Sinead O'Connor,
- twillis@ecn.purdue.edu / "Black Boys on Mopeds"
-
-
-
- - -------------------------
-
- From: ahbritto@iat.com (Arthur H. Britto)
- Subject: Fatest code to fill memory?
- Date: 13 Feb 92 03:02:57 GMT
- Organization: Information Access Technologies, Inc. of Berkeley, CA
-
- jesjones@milton.u.washington.edu (Jesse Jones) writes:
-
- > Chris Tate is right: assembly code is definitely the best way to go
- >if you want your graphic routines to run as fast as possible. The code
- >he has is fine as long as you remember that the decrement and branch
- >instructions are restricted to word length counter registers.
-
- Faster than the MOVE.L is the MOVEM instruction. Save off all the resgisters
- you can and then use the MOVEM instruction. Of course, you should unwrap
- your loops for maximum effect. Better yet, have no loops and calculate a
- branch into the code.
-
- This is how Armor Alley does it.
-
- Arthur Britto
- --
- - ----------------------------------------------------------------------------
- Information Access Technologies, Inc. Internet: ahbritto@iat.com
- 46 Shattuck Square, Suite 11 Applelink: ahbritto@iat.com@internet#
- Berkeley, CA 94704-1152 Voice: 510-704-0160 Fax: 510-704-8019
-
-
-
- - -------------------------
-
- From: deadman@garnet.berkeley.edu (Ben Haller)
- Subject: Fatest code to fill memory?
- Date: 15 Feb 92 00:02:58 GMT
- Organization: Stick Software
-
- In article <1992Feb13.030257.13145@iat.com> ahbritto@iat.com (Arthur H. Britto) writes:
- >Of course, you should unwrap your loops for maximum effect.
-
- This is a common fallacy. In fact, on any Mac upwards of the 68000-based
- ones, unrolling loops kills your performance unless you are very careful.
- You need to make sure that your loop will fit inside the cache size of the
- chip you're running on (different chips have different sizes, too). In
- general if you're doing memory accesses (filling memory with a value) the
- instructions will pipeline such that the loop branch instruction
- effectively takes a negligable amount of time anyway. Unrolling may
- speed things up, but beware - if you unroll it too far, you'll start
- thrashing the instruction cache, and your performance will go down the
- tubes.
-
- -Ben Haller (deadman@garnet.berkeley.edu)
-
-
-
- ---------------------------
-
- End of C.S.M.P. Digest
- **********************
-